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Summary 


We  apply  the  Generalized  Minimal  Residual  (GMRES)  iterative  equation  so¬ 
lution  technique  to  a  set  of  full,  unsymmetric  matrix  systems  generated  by  a 
standard  boundary  element  method.  The  test  problems  chosen  produced  well 
conditioned  matrices.  The  GMRES  technique,  when  used  without  precondi¬ 
tioning  and  with  a  sufficient  number  of  trial  vectors,  solved  the  matrix  system 
using  as  few  as  23%  of  the  operations  required  by  a  direct  Gauss  reduction. 
The  class  of  partial  LU  decomposition  preconditioners  tested  degraded  the 
condition  number  of  the  matrices,  and  consequently  did  not  reduce  the  GM¬ 
RES  solution  time.  In  genereil,  the  GMRES  technique  does  not  appear  to 
be  of  practical  interest  compared  to  the  direct  reduction  unless  other  factors 
(availability  of  a  good  approximation  to  the  final  solution,  etc.)  intervene. 
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1  Introduction 


Iterative  equation  solvers  are  becoming  increasingly  popular  for  certain  spe¬ 
cial  computational  applications.  One  such  application  is  in  solving  large 
sparse  matrix  systems,  such  as  those  generated  in  3-D  finite  element  or  finite 
difference  analysis.  In  these  problems,  the  storage  required  for  the  nonzero 
matrix  entries  may  be  only  .02  [1],  and  similarly  the  number  of  opera¬ 

tions  required  for  a  matrix- vector  multiply  is  significcintly  less  than  (so 
that  we  can  afford  many  iterations).  Another  widely-investigated  application 
for  iterative  methods  is  in  cases  where  the  solution  is  already  approximately 
known,  such  as  in  time- dependent  problems  or  nonlinear  problems.  Finally, 
iterative  solvers  are  generally  more  amenable  to  vectorization/parallehzation 
than  direct  solvers. 

In  this  study,  we  have  examined  whether  a  particular  iterative  solver 
(GMRES  [2])  is  competitive  with  direct  Gauss  solvers  for  full,  unsymmetric 
matrices  derived  from  standard  boundary  integral  techniques.  Here,  we  do 
not  have  the  storage  or  matrix-vector  multiply  advantages  that  occur  in 
solving  sparse  matrices. 

The  Generalized  Minimal  Residual  algorithm  developed  by  Saad  and 
Schultz  [2],  and  presented  in  preconditioned  form  by  Shakib,  Hughes,  and 
Joh2ui  [3],  is  very  similar  to  conjugate  gradient  algorithms.  However,  GM¬ 
RES  is  intended  specifi'aUy  for  unsymmetric  matrices,  it  requires  only  a 
single  matrix- vector  mu  tiply  in  each  iteration,  and  its  rate  of  convergence 
depends  only  on  the  condition  number  of  the  matrix.  (Some  conjugate  gra¬ 
dient  algorithms’  convergence  rates  depend  on  the  square  of  the  condition 
number.)  Brussino  and  Sonnad  [4]  present  an  extensive  survey  of  the  various 
conjugate-gradient  and  GMRES  algorithms.  Their  test  cases  [4,5]  indicate 
that  GMRES  is  comparable  or  superior  to  preconditioned  conjugate  gradient 
algorithms^  for  many  problems. 

Section  2  introduces  the  test  problems  we  investigated,  and  discusses  the 
matrix  condition  numbers.  In  addition,  we  discuss  the  residuad  error  from  a 
direct  solution  and  some  results  for  a  least  squares  conjugate  gradient  solver. 
In  section  3  we  discuss  the  GMRES  algorithm  and  proposed  preconditioner. 

should  be  noted  that  the  ieast-squues  conjugate  gradient  algorithm  discussed  in 
section  2.3  is  apparently  one  of  the  least  efficient  conjugate  gradient  algorithms  for  un- 
synunetric  matrices. 
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Section  4  discusses  our  numerical  results,  and  section  5  details  our  conclu¬ 
sions.  Appendix  A  includes  a  listing  of  the  GMRES  program. 


2  Test  Problem 

To  generate  the  systems  of  equations  used  for  testing,  we  employed  Brebbia’s 
boundary  integral  program  PROGRAMl  [6].  This  program  solves  Laplace’s 
equation 

V2u  =  0  (1) 

in  two  dimensions,  subject  to  the  boundary  conditions  u  =  u  on  boundary 
Fi  and  q  =  q  on  boundary  Fa.  (The  entire  boundary  F  =  Fi  -1-  Fj.)  The 
boundary  integral  equation  is  derived  by  the  method  of  weighted  residuals, 
and  becomes 

where  u'  is  the  solution  for  u  at  point  i,  u*  is  the  Green’s  function  for  u,  and 

q*  =  Constant  elements  are  used  to  generate  a  matrix  system  of  the 

form 

Ax  =  b  (3) 

We  chose  to  examine  one  physical  problem  with  different  aspect  ratios, 
and  produce  successively  finer  boundary  element  discretizations  to  generate 
our  test  ccises.  Figure  1  shows  the  physical  ctise  considered,  and  its  exact 
solution.  We  examined  three  cases:  L=l,  L=10,  and  L=100.  Figure  2  shows 
a  typical  mesh  with  12  unknowns  for  the  square  (L=l)  example. 

2.1  Condition  Number  Results 

It  is  well  known  that  the  rate  of  convergence  of  iterative  methods,  and  there¬ 
fore  the  solution  time,  is  closely  tied  to  the  condition  number  of  the  matrix 
A.  The  accuracy  of  direct  solution  methods  may  be  impacted  by  the  con¬ 
dition  number  of  A,  but  of  course  the  solution  time  is  unaffected.  Table  1 
shows  the  condition  numbers^  for  our  test  problems  with  various  N.  Notice 
that  the  meshes  are  very  well  conditioned. 

^Condition  numbers  estimated  with  LINPAK  subroutine  DGBCO. 
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Figure  1:  Physical  Test  Case  and  Analytical  Solution 


Figure  2:  Typical  Square  (L=l)  Mesh  with  12  Elements 


Number  of 

Inverse  of  Condition  Number 

Elements 

1/k 

_ N _ 

L=1  L=10  L=100 

WKSMM 

4.8  X  10-2  2.0  X  10-^  9.1  X  10"® 

2.4  X  10"^  9.5  X  lO-"*  4.7  x  lO"*^ 

160 

1.1  X  10-2  4.5  X  lO--*  2.3  X  10-® 

320 

5.4  X  10-2  2.1  V  10-“  1.1  X  10-® 

Table  1;  Condition  Number  for  Test  Cases 
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2.2  Direct  Solver  Results 


We  used  the  direct  solver  SLNPD  from  Brebbia  [6].  This  routine  uses  direct 
Gauss  elimination  with  row  interchanges  to  solve  unsymmetric,  non-positive 
definite  matrices.  (Row  interchanges  are  only  employed  when  the  magnitude 
of  the  diagonal  entry  is  less  than  10“^.)  As  is  well  known  [8],  it  requires 
N^/3  floating  point  operations  (flops)^  to  solve  an  TV  x  A'  matrix  system. 
Aside  from  the  storage  required  for  the  origincil  (full)  matrix  and  the  forcing 
vector,  the  routine  requires  no  additional  memory. 

In  choosing  a  convergence  tolerance  for  iterative  methods,  we  obviously 
should  not  request  more  accuracy  than  we  can  obtain  from  a  direct  solution. 
Table  2  shows  the  2- norm  (>/r^r)  of  the  residual  r 

b  —  Ax  =  r  (4) 

computed  with  double  precision  from  the  direct  solution  results.  Notice  that 
the  residual  norm  is  extremely  small  for  the  direct  solution. 

2.3  Least-Squares  Conjugate  Gradient  Results 

As  a  basis  for  comparison,  table  3  shows  the  ratio  of  the  flops^  for  the  least- 
squares  conjugate  gradient  method  to  the  flops  for  a  direct  solution.  (The 
least  squares  conjugate  gradient  algorithm  has  been  successfully  used  in  nu¬ 
merous  other  applications  at  University  of  Michigan  [7].)  Convergence  is 
assumed  to  occur  when  the  norm  of  the  residual  is  reduced  by  a  factor  (ctoi) 
of  10"*.  Notice  that  the  unpreconditioned  conjugate  gradient  method  is 
never  competitive  with  direct  solutions  for  the  test  problem  considered. 


^“...a  flop  roughly  constitutes  the  effort  of  doing  a  floating  point  add,  a  floating  point 
multiply,  and  a  little  subscripting.”  [8],  p.  32 

^Each  iteration  requires  2TV*  operations  for  large  TV. 
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Number  of 
Elements 
N 


Norm  of  Residual 

l|r|| 


Table  2:  Residual  Error  for  Test  Cases 


Elements 

N 


Iterative  /  D  irect 
flops 

L=1  L=10  L=100 

1.95 

3.00 

5.55 

2.03 

4.65 

— 

2.06 

— 

— 

1.61 

— 

— 

Table  3:  Conjugate  Gradient  Results  for  Test  Cases 


3  GMRES  Solver 


We  implement  the  preconditioned  GMRES  algorithm  with  restarts  as  de¬ 
scribed  in  Shakib,  Hughes,  and  Johan  [3]  (see  Figure  3,  which  is  reproduced 
directly  from  their  paper).  Appendix  A  contains  the  GMRES  subroutine 
listing.  The  preconditioned  system  is  assumed  to  have  the  form 

(L-'AU-‘)(Ux)  =  (L-'b)  (5) 

In  Figure  3,  k  is  the  number  of  vectors  we  use  before  restarting  the  algorithm 
(initiating  a  new  GMRES  cycle).  l(  k  =  N,  then  the  algorithm  will  converge 
in  one  cycle.  We  also  must  supply  a  convergence  tolerance  {etoi)  for  the  norm 
of  the  preconditioned  residual,  and  a  maximum  number  of  cycles  (Imax)- 

We  examined  the  performance  of  the  GMRES  algorithm  with  no  pre¬ 
conditioner  (L=U=I)  and  with  a  partial  L  U  factorization.  Because  of  the 
nature  of  the  boundary  integral  technique,  it  seems  reasonable  to  expect  that 
the  points  which  are  physically  closest  to  point  i  on  the  mesh  will  have  the 
dominant  effect  on  the  solution  at  point  i.  We  therefore  select  a  number  of 
neighboring  points  m  to  include  in  a  banded  approximation  to  A  (which  has 
the  same  structure  as  a  finite  element  mesh).  This  approximate  A  matrix 
A  is  then  decomposed  to  form  L  U.  A  diagonal  preconditioner  corresponds 
to  m=0.  Figure  4  shows  a  schematic  of  the  banded  factorization  scheme  for 
m=l  and  m  =  2.  Note  that  both  L  and  U  aie  now  tightly  banded  matrices, 
and  it  is  possible  to  take  advantage  of  this  fact  in  storing  and  inverting  them. 

The  number  of  floating  point  operations  required  to  solve  the  system  of 
equations  obviously  depends  on  the  number  of  cycles  I  required,  the  number 
of  vectors  k  chosen,  and  the  size  of  the  band  m  chosen  for  the  preconditioner. 
If  the  system  has  no  preconditioner,  it  requires  roughly  (^  -I-  1)N^  flops  for 
each  of  the  /  cycles.  (This  figure  neglects  the  costs  associated  with  all  dot 
products,  the  Q-R  algorithm,  solving  for  y,  and  updating  the  solution.)  If 
the  procedure  terminates  within  a  cycle  (with  less  than  k  vectors  used),  the 
number  of  flops  will  be  less.  If  the  system  is  preconditioned,  then  we  need 
(fc  -h  l)(Ar*  -|-  2mN)  flops  which  is  of  the  same  order  so  long  as  m  does  not 
need  to  grow  with  N. 

The  storage  required  for  the  GMRES  algorithm  is  modest  so  long  as  k 
remains  small.  We  must  store  k  vectors  of  length  N  for  the  iterations,  plus 
approximately  entries  in  the  h  matrix.  In  addition,  the  two  preconditioner 
matrices  each  require  mN  storage  spaces. 
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Box  1  -  Preconditioned  GMRES  Algorithm. 


Given  i,  £toi  and  l-oMxt  proceed  as  follows: 


(Initialization) 
hir-L^^h 
S  =  ^tol  Pll 


X  =  0 


(GMRES  cycles) 
For  /  =  1, . . . , 


ui  =b-L-^AU-^x 

ei  = 


(GMRES  iteration) 

For  t  = 

lij+i  =  L^^AU~^Ui 

(Modified  Gram-Schmidt  orthogonalization) 

For  j  =  1, . . . ,  i 

A+i,j  =  («i+i,«i) 

«i+l  +-  Wj+i  —  0i^ijUj 

(End  modified  Gram-Schmidt  orthogonalization) 
..  .,^,>1,,,  l|tt, -4.111}^ 

tit+i 

Wi+l  ♦-  ii - 17 

(Q-R  algorithm) 

For  j  =  1, . . . ,  t  -  1 


Cj  Sj  n- 

—Si  Ci 

^  ^  J  ">+i 


Figure  3:  Preconditioned  GMRES  Algorithm  (continued  on  next  page) 
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(End  j  loop) 

-■  =  +  (Klif 


0 


Ci-fl  —  ^iCi 


Cj  i—  CjCj 

(End  Q-R  algorithm) 

Convergence  check:  If  [cj+il  <  e.  Exit  i  loop 


(End  GMRES  iteration) 
Solve  for  y: 


Update  solution: 


x*-x  +  ^  VjUj 

J-i 

Convergence  check:  If  |e,+il  <  e.  Exit  /  loop 
(End  GMRES  cycles) 

X  <—  U~^x 
Return 


j. 


xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 

xxxxxxxxxxxx 


XX  X 

XXX  X 

XXX  X 

XXX  X 

XXX  X 

XXX  X 

XXX  X 

XXX  X 


XXX  X 

X  X  X  X 
XXX 

xxxxxxxxxxxx 


A 

(ni=l) 
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4  GMRES  Results 

4.1  No  Preconditioner 

Tables  4  to  7  show  the  results  of  applying  the  GMRES  aJgorithm  without 
preconditioning  to  the  test  problem  considered.  The  convergence  tolerance 
ttoi  was  set  at  10~®.  Note  that  if  the  maximum  number  of  cycles  Imax  is  greater 
them  N/{3{k  + 1)),  then  the  GMRES  algorithm  with  one  cycle  requires  more 
flops  than  the  direct  solver.  Consequently,  we  set  that  as  an  upper  limit  for 
our  tests  for  1.  The  notation  “ — ”  in  the  table  indicates  that  the  solution 
did  not  converge  within  the  allowed  cycle  limits.  In  addition,  the  table  shows 
the  ratio  of  the  number  of  iterative  flops®  to  the  direct  flops.  Notice  that  the 
GMRES  method  becomes  more  competitive  with  the  direct  solution  as  the 
matrices  become  larger.  Also  it  is  clear  that  the  minimum  in  flops  occurs 
when  k  is  large  enough  that  the  routine  does  not  restart,  i.e.  when  only  one 
cycle  is  used.  Unfortunately,  this  corresponds  to  the  most  storage- intensive 
use  of  the  algorithm.  Finally,  we  observe  that  the  unpreconditioned  algorithm 
is  not  significantly  faster  than  the  direct  solve. 


®As8umed  to  be  (/  —  1)(A  +  1)A^*  +  (*+  where  i  is  the  number  of  iterations  in  the 
last  cycle. 


12 


#  of  vectors 
k 

/ 

L= 

i 

1 

I/D 

|H|n5Tn|H 

/ 

L= 

i 

LOO 

I/D 

1  -  7 

— 

— 

>1.0 

— 

— 

>1.0 

— 

— 

>1.0 

8 

2 

1 

0.83 

— 

— 

>1.0 

— 

— 

>1.0 

9 

1 

9 

0.75 

2 

8 

1.42 

— 

— 

>1.0 

10 

1 

9 

0.75 

2 

3 

1.13 

2 

10 

1.65 

11 

1 

9 

0.75 

1 

11 

0.90 

2 

2 

1.13 

12-  N 

1 

9 

0.75 

1 

11 

0.90 

1 

12 

0.98 

Table  4:  GMRES  with  No  Preconditioner,  N=40 
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#  of  vectors 
k 

/ 

L= 

i 

1 

I/D 

/ 

L= 

i 

10 

I/D 

/ 

L= 

i 

100 

I/D 

1  -  5 

— 

— 

>1.0 

— 

— 

>1.0 

— 

— 

>1.0 

6 

4 

6 

1.05 

— 

— 

>1.0 

— 

— 

>1.0 

7 

4 

1 

0.98 

— 

— 

>1.0 

— 

— 

>1.0 

8 

3 

6 

0.94 

— 

— 

>1.0 

— 

— 

>1.0 

9 

2 

9 

0.75 

— 

— 

>1.0 

— 

— 

>1.0 

10 

2 

6 

0.68 

— 

— 

>1.0 

— 

— 

>1.0 

11 

2 

3 

0.60 

— 

— 

>1.0 

— 

— 

>1.0 

12 

2 

1 

0.64 

— 

— 

>1.0 

— 

— 

>1.0 

13 

1 

14 

0.60 

— 

— 

>1.0 

— 

— 

>1.0 

14 

1 

14 

0.56 

— 

— 

>1.0 

— 

— 

>1.0 

15 

1 

14 

0.56 

— 

— 

>1.0 

— 

— 

>1.0 

16 

1 

14 

0.56 

2 

10 

1.05 

— 

— 

>1.0 

17 

1 

14 

0.56 

2 

10 

1.09 

— 

— 

>1.0 

18 

1 

14 

0.56 

2 

11 

1.16 

— 

— 

>1.0 

19 

1 

14 

0.56 

2 

3 

0.90 

— 

— 

>1.0 

20 

1 

14 

0.56 

1 

20 

0.79 

2 

20 

1.58 

21 

1 

14 

0.56 

1 

20 

0.79 

2 

15 

1.42 

22 

1 

14 

0.56 

1 

20 

0.79 

2 

15 

1.46 

23 

1 

14 

0.56 

1 

20 

0.79 

2 

11 

1.35 

24 

1 

14 

0.56 

1 

20 

0.79 

2 

11 

1.39 

25 

1 

14 

0.56 

1 

20 

0.79 

2 

6 

1.24 

26 

1 

14 

0.56 

1 

20 

0.79 

— 

— 

>1.00 

27-  N 

1 

14 

0.56 

1 

20 

0.79 

1 

27 

1.05 

Table  5:  GMRES  with  No  Preconditioner,  N=80 
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#  of  vectors 
k 


L=100 
/  i  I/D 


1  -  6 

— 

— 

>1.0 

— - 

— 

>1.0 

— 

— 

>] 

7 

6 

3 

0.83 

— 

— 

>1.0 

— 

— 

>] 

8 

5 

1 

0.71 

— 

— 

>1.0 

— 

— 

> 

9 

4 

5 

0.68 

— 

— 

>1.0 

— 

— 

>] 

10 

3 

10 

0.62 

— 

— 

>1.0 

— 

— 

>] 

11 

3 

5 

0.56 

— 

— 

>1.0 

— 

— 

>] 

12 

3 

1 

0.53 

— 

— 

>1.0 

— 

— 

>] 

13 

2 

13 

0.53 

— 

— 

>1.0 

— 

— 

>] 

14 

2 

12 

0.53 

4 

11 

1.07 

— 

— 

>] 

15 

2 

10 

0.51 

4 

7 

1.05 

— 

— 

>] 

16 

2 

7 

0.47 

3 

15 

0.94 

— 

— 

>] 

17 

2 

3 

0.41 

3 

12 

0.92 

— 

— 

>] 

18 

2 

3 

0.43 

3 

11 

0.94 

— 

— 

>] 

19 

2 

1 

0.41 

3 

7 

0.90 

— 

— 

>] 

20 

1 

20 

0.39 

3 

3 

0.86 

— 

— 

>] 

21 

1 

20 

0.39 

2 

19 

0.79 

— 

— 

>] 

22 

1 

20 

0.39 

2 

18 

0.79 

— 

— 

>] 

23 

1 

20 

0.39 

2 

18 

0.81 

— 

— 

>] 

24 

1 

20 

0.39 

2 

16 

0.79 

— 

— 

>] 

25 

1 

20 

0.39 

2 

14 

0.77 

— 

— 

>] 

26 

1 

20 

0.39 

2 

13 

0.77 

— 

— 

>] 

27 

1 

20 

0.39 

2 

13 

0.79 

— 

— 

>] 

28 

1 

20 

0.39 

2 

10 

0.75 

— 

— 

>] 

29 

1 

20 

0.39 

2 

8 

0.73 

— 

— 

>] 

30 

1 

20 

0.39 

2 

6 

0.71 

— 

— 

>] 

Table  6:  GMRES  with  No  Preconditioner,  N=160  (continued  on  next  page) 
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#  of  vectors 
k 

/ 

L= 

i 

1 

I/D 

/ 

31 

1 

20 

0.39 

2 

32 

1 

20 

0.39 

1 

33 

1 

20 

0.39 

1 

34 

1 

20 

0.39 

1 

35 

1 

20 

0.39 

1 

36 

1 

20 

0.39 

1 

37 

1 

20 

0.39 

1 

38 

1 

20 

0.39 

1 

39 

1 

20 

0.39 

1 

40 

1 

20 

0.39 

1 

41 

1 

20 

0.39 

1 

42 

1 

20 

0.39 

1 

43-  N 

1 

20 

0.39 

1 

4.2  Banded  Preconditioner 

We  initially  tested  the  preconditioner  on  the  L=1  cases,  which  are  the  most 
well-conditioned  systems,  with  disappointing  results.  The  diagonal  precon¬ 
ditioner  (m=0)  had  essentially  no  effect,  since  the  diagonals  of  the  original 
matrix  are  almost  all  equal.  A  full  LU  decomposition  {m=N  —  1)  caused  the 
iteration  to  converge  in  exactly  one  step,  as  expected.  However,  other  val¬ 
ues  of  m  actuadly  degraded  the  performance  of  the  algorithm.  For  instance, 
m=l  (a  psuedo-tridiagonal  preconditioner)  caused  slower  convergence  for  N 
=  40,80,  and  160.  Larger  values  of  m  also  caused  slower  convergence  than 
no  preconditioner.  Checking  the  condition  number  of  the  preconditioned 
matrices  showed  that  it  was,  in  fact,  worse  than  for  the  original  matrices. 
One  possible  explanation  for  this  is  that  the  condition  number  of  the  CT?ginal 
matrices  is  already  so  good  that  little  heuristic  improvement  is  possible. 

As  a  second  check,  we  examine  the  effect  of  the  preconditioner  on  the 
L=100,  A^=80  matrix,  which  converges  poorly  and  has  a  condition  number 
of  4.7  X  10~®  in  its  original  form.  Using  a  preconditioner  with  m=l  to  10, 
we  find  no  convergence  at  all  in  the  allowed  number  of  iterations. 

5  Conclusions 

The  GMRES  technique  requires  somewhat  fewer  flops  than  a  direct  solve  as 
the  full,  unsymmetric  boundary  integral  matrices  we  tested  become  large. 
However,  the  ratio  was  only  .23  at  best.  The  heuristic  preconditioner  we 
tested  did  not  improve  the  convergence.  In  general,  unless  a  good  precondi¬ 
tioner  or  a  good  approximation  to  the  solution  x  is  available,  the  additional 
storage  and  uncertainty  associated  with  the  iterative  solver  make  it  imprac¬ 
tical  for  full  matrices. 
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#  of  vectors 
k 

/ 

L= 

t 

1 

I/D 

/ 

L= 

i 

10 

I/D 

1  -  5 

— 

— 

>1.0 

— 

— 

>1.0 

6 

15 

3 

0.96 

— 

— 

>1.0 

7 

10 

4 

0.72 

— 

— 

>1.0 

8 

7 

6 

0.57 

— 

— 

>1.0 

9 

5 

8 

0.46 

— 

— 

>1.0 

10 

4 

17 

0.38 

— 

— 

>1.0 

11 

4 

3 

0.38 

— 

— 

>1.0 

12 

3 

8 

0.33 

8 

10 

0.96 

13 

3 

5 

0.32 

7 

9 

0.88 

14 

3 

1 

0.30 

6 

10 

0.81 

15 

2 

13 

0.28 

5 

14 

0.74 

16 

2 

13 

0.29 

5 

6 

0.70 

17 

2 

12 

0.29 

4 

14 

0.65 

18 

2 

8 

0.26 

4 

11 

0.65 

19 

2 

5 

0.24 

4 

8 

0.65 

20 

2 

5 

0.25 

3 

19 

0.58 

21 

2 

3 

0.24 

3 

17 

0.58 

22 

2 

1 

0.23 

3 

15 

0.58 

23 

1 

23 

0.23 

3 

12 

0.57 

24 

1 

23 

0.23 

3 

7 

0.54 

25 

1 

23 

0.23 

3 

3 

0.52 

26 

1 

23 

0.23 

2 

25 

0.50 

27 

1 

23 

0.23 

2 

24 

0.50 

28 

1 

23 

0.23 

2 

23 

0.50 

29 

1 

23 

0.23 

2 

23 

0.51 

30 

1 

23 

0.23 

2 

22 

0.51 

31 

1 

23 

0.23 

2 

21 

0.51 

32 

1 

23 

0.23 

2 

19 

0.50 

33 

1 

23 

0.23 

2 

17 

0.49 

34 

1 

23 

0.23 

2 

16 

0.49 

35 

1 

23 

0.23 

2 

14 

0.48 

L=100 
:  I/D 


Table  7:  GMRES  with  No  Preconditioner,  N=320,  (continued  next  page) 
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#  of  vectors 
k 

/ 

L= 

t 

1 

I/D 

/ 

L= 

i 

10 

I/D 

36 

1 

23 

0.23 

2 

13 

0.48 

— 

— 

>1.0 

37 

1 

23 

0.23 

2 

11 

0.47 

— 

— 

>1.0 

38 

1 

23 

0.23 

2 

9 

0.46 

— 

— 

>1.0 

39 

1 

23 

0.23 

2 

5 

0.43 

— 

— 

>1.0 

40 

1 

23 

0.23 

2 

5 

0.44 

— 

— 

>1.0 

41 

1 

X 

23 

0.23 

1 

41 

0.39 

— 

— 

>1.0 

42 

1 

23 

0.23 

1 

41 

0.39 

— 

— 

>1.0 

43 

1 

23 

0.23 

1 

41 

0.39 

3 

42 

1.23 

44 

1 

23 

0.23 

1 

41 

0.39 

3 

42 

1.25 

45 

1 

23 

0.23 

1 

41 

0.39 

3 

33 

1.18 

46 

1 

23 

0.23 

1 

41 

0.39 

3 

34 

1.21 

47 

1 

23 

0.23 

1 

41 

0.39 

3 

30 

1.19 

48 

1 

23 

0.23 

1 

41 

0.39 

3 

27 

1.18 

49 

1 

23 

0.23 

1 

41 

0.39 

2 

49 

0.94 

50 

1 

23 

0.23 

1 

41 

0.39 

3 

10 

1.06 

51 

1 

23 

0.23 

1 

41 

0.39 

2 

50 

0.97 

52 

1 

23 

0.23 

1 

41 

0.39 

2 

50 

0.98 

53 

1 

23 

0.23 

1 

41 

0.39 

2 

48 

0.97 

54 

1 

23 

0.23 

1 

41 

0.39 

2 

41 

0.91 

55 

1 

23 

0.23 

1 

41 

0.39 

2 

42 

0.93 

56 

1 

23 

0.23 

1 

41 

0.39 

2 

42 

0.94 

57 

X 

23 

0.23 

1 

41 

0.39 

2 

34 

0.87 

58 

1 

23 

0.23 

1 

41 

0.39 

2 

34 

0.88 

59 

1 

23 

0.23 

1 

41 

0.39 

2 

33 

0.88 

60 

1 

23 

0.23 

1 

41 

0.39 

2 

33 

0.89 

61 

1 

23 

0.23 

1 

41 

0.39 

2 

24 

0.82 

62 

1 

23 

0.23 

1 

41 

0.39 

2 

23 

0.82 

63 

1 

23 

0.23 

1 

41 

0.39 

2 

21 

0.81 

64 

1 

23 

0.23 

1 

41 

0.39 

2 

22 

0.82 

65 

1 

23 

0.23 

1 

41 

0.39 

2 

13 

0.75 

66 

1 

23 

0.23 

1 

41 

0.39 

2 

11 

0.74 

67 

1 

23 

0.23 

1 

41 

0.39 

2 

13 

0.77 

68  -  N 

1 

23 

0.23 

1 

41 

0.39 

1 

68 

0.65 

19 
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A  GMRES  Program  Listings 

The  routines  listed  below  are  available  on  the  Apollo  network.  The  main 
routine  is  in  //kilroy/users/olson/gmres/gmres.ftn  and  the  utility  routines 
are  in  //kilroy/users/olson/gmres/util.ftn. 

SUBROUTINE  GMRES  (AK, X,B,AU,AL.Z,U.EBAR. BETA, HEAR, G.S.Y, 

1  N, NX. K.KPl.TOL.MAX, NITER. NK) 

C 

C  USE  THE  GENERALIZED  MINIMAL  RESIDUAL  METHOD  TO  SOLVE 
C  AK*X*R,  WHERE  AK  IS  A  FULL  UNSYMMETRIC  N  X  N 

C  MATRIX  (SEE  SHAKIB.  HUGHES.  JOHAN,  P17) 

C 

C  INPUTS : 

C  AK  (N.N) 

C  X(N) 

C  B(N) 

C  AU(N,N) 

C 
C 

C  ALCN.N) 

C 
C 

C  Z(N) 

C  U(N,K-*-l) 

C  EBAR(K+1) 

C  BETA(K+1.K) 

C  HBAR(K+1,K) 

C  C(K) 

C  S(K) 

C  Y(K) 

C  N 

C  NX 

C  K 

C  KPl 

C  TOL 

C  MAX 

C 

C  OUTPUTS : 


»  STIFFNESS  MATRIX 
=  SOLUTION  VECTOR 
«  LOAD  VECTOR 
»  UPPER  TRIANGULAR 
PRECONDITIONING  MATRIX 

(STORED  AS  A  FULL  MATRIX  FOR  TESTING  PURPOSES) 

*  LOWER  TRIANGULAR 
PRECONDITIONING  MATRIX 

(STORED  AS  A  FULL  MATRIX  FOR  TESTING  PURPOSES) 
»  WORKING  VECTOR 

*  WORKING  VECTOR 
=  WORKING  VECTOR 
=  WORKING  VECTOR 
«  WORKING  VECTOR 

*  WORKING  VECTOR  (COSINES  IN  Q-R) 

»  WORKING  VECTOR  (SINES  IN  Q-R) 

=  WORKING  VECTOR  (WEIGHTS  FOR  U) 

«  ACTUAL  DIMENSION  OF  STIFFNESS  MATRIX 
»  STATED  DIMENSION  OF  STIFFNESS  MATRIX 
=  NUMBER  OF  VECTORS  IN  GMRES  ITERATION 

*  K+l 

=  TOLERANCE  FOR  CONVERGENCE  CHECK 
s  MAXIMUM  NUMBER  OF  CYCLES  ALLOWED 
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c  X(N)  =  SOLUTION  VECTOR 

C  NITER  »  ACTUAL  NUMBER  OF  CYCLES  USED 

C  NK  =  NUMBER  OF  VECTORS  REQUIRED  IN  LAST  ITERATION 

C 

IMPLICIT  REAL+8  (A-H.O-Z) 

DIMENSION  AK(NX.*) ,X(*) .B(*) .AU(NX,*) .AL(NX.*) .Z(*)  . 

1  U(NX.*),EBAR(*).BETA(KP1,*).HBAR(KP1.*).C(*).S(*) .Y(*) 

C 

C  INITIALIZE  VARIABLES 
C 

RZER0=1.0E-20 
DO  10  1=1, N 

DO  5  J=1,I-1 

B(I)=B(I)  -  AL(I,J)*B(J) 

5  CONTINUE 
B(I)=B(I)/AL(I,I) 

10  CONTINUE 

CALL  DOT  (B.B.N.EPS) 

EPS=TOL*SQRT(EPS) 

CALL  ZERO  (X.N) 

CALL  ZERO  (Z.N) 

NMATEL=N*KP1 

CALL  ZERO  (U.NHATEL) 

CALL  ZERO  (EBAR.KPl) 

NMATEL»K*KP1 

CALL  ZERO  (BETA.NMATEL) 

CALL  ZERO  (HBAR.NMATEL) 

CALL  ZERO  (C.K) 

CALL  ZERO  (S.K) 

CALL  ZERO  (Y.K) 

C 

C  ITERATE  AT  MOST  MAX  TIMES  (GMRES  CYCLES) 

C 

DO  800  L=1,MAX 
NITER=L 

CALL  ZERO  (Z.N) 

CALL  PREMUL  (AL,AK,AU,X,U(1,1) ,N.NX,Z) 

DO  310  INDEX=1,N 

U(INDEX,1)»B(INDEX)-Z(INDEX) 
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310 


320 


C 

C 

C 


C 

C 

C 


1 

325 

330 

C 

C 

C 


340 


350 


CONTINUE 

CALL  DOT  (U(1.1),U(1.1),N.EBAR(1)) 
EBAR(l)=sqRT(EBAR(l)) 

DO  320  INDEX=1.N 

U(INDEX.1)=U(INDEX,1)/EBAR(1) 

CONTINUE 

GMRES  ITERATION 

DO  400  1=1, K 
NK»I 

CALL  ZERO  (U<1,I+1).N) 

CALL  PREMUL  (AL.AK,AU.U(1,I) .Z.N.NX.U(1.I+1)) 
MODIFIED  GRAM-SCHMIDT  ORTHOGONALIZATION 
DO  330  J=1,I 

CALL  DOT  (U(1,I*1),U(1,J),N,BETA(I+1.J)) 
DO  325  INDEX* 1,N 

U(INDEX.I+1)=U(INDEX,I+1)  - 

BETA(I+1,J)*U(INDEX,J) 

CONTINUE 

CONTINUE 

END  MODIFIED  GRaM-SCHMIDT  ORTHOGONALIZATION 

CALL  DOT  (U(1,I+1),U(1,I+1),N,TEMP) 
TEMP=SQRT(TEMP) 

DO  340  INDEX=1,I 

HBAR( INDEX , 1) =BETA (1+ 1 , INDEX) 

CONTINUE 

HBAR(I+1,I)»TEMP 

IF  (ABS(TEMP).GT.1.0E-20)  TEMP=1 .0/TEMP 
DO  350  INDEX-1 ,N 

U(INDEX,I+1)«U(INDEX,I+1)*TEMP 

CONTINUE 


C 

C 

C 


Q-R  ALGORITHM 
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DO  360  J=1,I-1 

TEMP1=C(J)*HBAR(J,I)  +  S(J)*HBAR( J+1 .1) 
TEMP2=-S(J)*HBAR(J.I)  +  C(J)*HBAR(J-»-l  ,1) 
HBAR(J.I)=TEMP1 
HBAR(J+1,I)=TEMP2 
360  CONTINUE 

R=SQRT(HBAR(I,1)*HBAR(I.I)  +  HBAR(I+1.I)*HBAR(I+1 .1)) 

C(I)=HBAR(I.I)/R 

S(I)=HBAR(I+1.I)/R 

HBAR(I,I)=R 

HBAR(I+1.I')=0.0 

EBAR(I+1)=-S(I)*EBAR(I) 

EBAR(I)=C(I)*EBAR(I) 

C 

C  END  Q-R  ALGORITHM 

C  CHECK  FOR  CONVERGENCE 

C 

IF  (ABS(EBAR(I+1)) .LT.EPS)  GO  TO  500 
C 

C  END  GMRES  ITERATION? 

C 

400  CONTINUE 

500  CONTINUE 

C 

C  SOLVE  FOR  ^ 

C 

CALL  ZERO  (Y.K) 

DO  510  INDEX*NK,1,-1 

Y(INDEX)»EBAR(INDEX) 

DO  505  IND»INDEX+1,K 

Y(INDEX)*Y(INDEX)  -  Y (IND )*HBAR( INDEX, IND) 

505  CONTINUE 

IF  (ABSCHBARCINDEX, INDEX) ).LT.RZERO)  THEN 
Y(INDEX)=0,0 

ELSE 

Y (INDEX) *Y ( INDEX) /HBAR (INDEX , INDEX) 

ENDIF 

510  CONTINUE 
C 
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c  UPDATE  SOLUTION 

C 

DJ  650  J=1,K 

DO  600  INDEX*1,N 

X(INDEX)=X(INDEX)+  Y( J)*U( INDEX, J) 

600  CONTINUE 

650  CONTINUE 

C 

C  CHECK  FOR  CONVERGENCE 

C 

IF  (ABS(EBAR(NK+1)).LT.EPS)  GO  TO  900 
C 

C  END  GMRES  CYCLES.  FAILURE  TO  CONVERGE 

C 

800  CONTINUE 

WRITE  (*,*)  ‘SOLUTION  DID  NOT  CONVERGE  IN' , MAX, ‘CYCLES' 
WRITE  (*.*)  ‘X*‘,(X(I),I=1.N) 

STOP 

C 

C  SOLUTION  CONVERGED 

C 

900  CONTINUE 

DO  920  I*N,1,-1 

DO  910  JaI+l,N 

X(I)=X(I)  -  AU(I,J)*X(J) 

910  CONTINUE 

X(I)=X(I)/AU(I.I) 

920  CONTINUE 
RETURN 
END 


SUBROUTINE  DOT  (W1 ,W2,N,TEMP) 

C 

C  FIND  THE  DOT  PRODUCT  OF  TWO  VECTORS 

C 

C  INPUTS: 
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C  Wl(N)  =  FIRST  VECTOR 

C  W2(N)  =  SECOND  VECTOR 

C  N  =  LENGTH  OF  VECTORS 

C 

C  OUTPUTS: 

C  TEMP  =  W*H 

C 

IMPLICIT  REAL*8  (A-H,0-2) 

DIMENSION  H1(N).W2(N) 

C 

TEMP=0.0 
DO  100  1=1, N 

TEMP=TEMP  +  W1(I)*W2(I) 

100  CONTINUE 
RETURN 
END 
C 

SUBROUTINE  ZERO  (X.N) 

C 

C  SET  X  TO  ZERO 

C 

C  INPUTS: 

C  X(N)  =  VECTOR  TO  BE  ZEROED 

C  N  «  SIZE  01  VECTOR 

C 

C  OUTPUTS: 

C  X  =0 

C 

IMPLICIT  REALMS  (A-H,0-Z) 

DIMENSION  X(N) 

C 

DO  100  1=1, N 
X(I)=0.0 
100  CONTINUE 
C 

RETURN 

END 

C 

SUBROUTINE  PREMUL  (AL,AK,AU,X,H,N,NX,Z) 
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c 

C  FORM  Z=(AL)*-1  *  AK  ♦  (AU)‘-1  ♦  X 
C 

C  INPUTS : 

C  AL(N.N)  =  LOWER  DIAGONAL  PRECONDITIONING  MATRIX 

C  AK(N,N)  =  STIFFNESS  MATRIX 

C  AU(N.N)  =  UPPER  DIAGONAL  PRECONDITIONING  MATRIX 

C  X(N)  =  SOLUTION  VECTOR 

C  W(N)  =  WORKING  VECTOR 

C  N  =  SIZE  OF  ARRAYS 

C  NX  =  DECLAPXD  SIZE  OF  ARRAYS 

C 

C  OUTPUTS: 

C  Z(N)  =  AL‘-1  *  AK  *  AU--1  *  X 

C 

C 

IMPLICIT  REAL+8  (A-H,0-Z) 

DIMENSION  AL(NX,*) ,AK(NX,*) ,AU(NX,*) ,X(*) ,W(*) ,Z(*) 
C 

C  AU‘-1  •  X 

C 

DO  20  I=N,1.-1 
W(I)»X(I) 

DO  10  J=I+1,N 

W(I)=W(I)  -  AU(I,J)*W(J) 

10  CONTINUE 

W(I)=W(I)/AU(I,I) 

20  CONTINUE 
C 

C  AK*  AU“-1  *X 

C 

DO  100  1=1, N 

Z(I)=0.0 
DO  50  J=1,N 

Z(I)=Z(I)  +  AK(I.J)*W(J) 

50  CONTINUE 

100  CONTINUE 
C 

C  AL“-1  *  AK  ♦  AU'~1  *X 
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c 

DO  200  1=1, N 

DO  150  J=1,I-1 

Z(I)=Z(I)  -  AL(I.J)*Z(J) 
150  CONTINUE 

Z(I)=Z(I)/AL(I,I) 

200  CONTINUE 
C 

RETURN 

END 
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